feat(build): per-app python_version with cross-resource validation#322
feat(build): per-app python_version with cross-resource validation#322
Conversation
|
Promptless prepared a documentation update related to this change. Triggered by runpod/flash#322 This PR introduces per-app Python version selection for Flash, allowing users to target Python 3.10, 3.11, or 3.12. The docs update covers the new |
2548f64 to
6138ae2
Compare
runpod-Henrik
left a comment
There was a problem hiding this comment.
QA Review — PR 322
Tested against v1.14.0 baseline. Four findings.
Finding 1: SDK ships before worker images exist
GPU_PYTHON_VERSIONS and CPU_PYTHON_VERSIONS now include "3.10" and "3.11", but these image tags don't exist until a separate flash-worker PR merges. A user who deploys with --python-version 3.10 or --python-version 3.11 today will silently receive the 3.12 image (or the deploy will fail with an opaque image-pull error), depending on how RunPod handles missing tags. The CLI accepts the flag and emits no warning.
Question: Is there a plan to block --python-version 3.10 and --python-version 3.11 at the CLI until the worker images exist? Or will this ship as-is?
Finding 2: First deploy after SDK upgrade may trigger a spurious rolling release
python_version is now extracted from resource configs and stamped into the manifest (via target_python_version). For any project that had python_version set on a resource before this PR, the manifest fingerprint will differ from the deployed fingerprint even though the user changed nothing. This will trigger the rolling release recycle on first deploy after upgrade.
Whether this is acceptable depends on whether the behaviour is documented. Nothing in the changelog or the _reconcile_python_version docstring mentions it.
Question: Is this a known and acceptable tradeoff? If so, it should be called out in the release notes.
Finding 3: self.python_version is set twice in ManifestBuilder
In manifest.py, self.python_version is assigned in __init__ and then overwritten inside build(). Any code that reads self.python_version between construction and build() would see the wrong value. Currently nothing does, but the pattern is fragile. The __init__ assignment is dead assignment — if build() always overwrites it, the __init__ line should be removed.
Finding 4: Isolation fix is scoped to two tests only
preserve_runpod_flash_modules is added to two tests in test_dotenv_loading.py. If the same dotenv re-import issue surfaces in other test files (the underlying cause is module-level side effects in runpod_flash/__init__.py), those tests will still be fragile. This fix is targeted, not structural — that's fine for now but worth noting if isolation artifacts appear elsewhere.
No blockers on the core logic. _reconcile_python_version() reads cleanly, the validation cases are thorough (9 new tests cover the main paths), and the CLI flag wiring is straightforward.
There was a problem hiding this comment.
Pull request overview
This PR extends Flash’s worker-image Python version support to 3.10/3.11/3.12 and adds app-level reconciliation/validation so a single deploy artifact (one tarball) can’t mix Python ABIs across resources.
Changes:
- Expand supported GPU/CPU Python versions to
3.10,3.11, and3.12, keeping3.12as the default. - Add manifest reconciliation to enforce a single app-level Python version (with
--python-versionoverride support inflash build/flash deploy) and stamp a consistenttarget_python_versionper resource. - Update unit tests and docs to reflect the new version-selection behavior and improve test isolation around module reloading.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
src/runpod_flash/core/resources/constants.py |
Expands supported Python versions and updates related documentation/comments. |
src/runpod_flash/cli/commands/build_utils/manifest.py |
Adds cross-resource python_version reconciliation and stamps target_python_version consistently. |
src/runpod_flash/cli/commands/build.py |
Threads --python-version through build to the manifest builder and uses manifest for wheel ABI selection. |
src/runpod_flash/cli/commands/deploy.py |
Adds --python-version flag and passes it through to build. |
src/runpod_flash/cli/docs/flash-build.md |
Documents --python-version (but currently also documents flags not implemented by the CLI). |
src/runpod_flash/cli/docs/flash-deploy.md |
Documents --python-version for deploy. |
docs/Flash_Deploy_Guide.md |
Adds end-user guidance on Python version selection and tradeoffs. |
docs/Deployment_Architecture.md |
Updates architecture notes to reflect targeting the app’s python_version for wheel selection. |
tests/unit/cli/commands/build_utils/test_manifest.py |
Adds unit tests covering reconciliation rules and error cases. |
tests/unit/core/resources/test_constants.py |
Updates assertions for the expanded version sets. |
tests/unit/resources/test_live_serverless.py |
Updates live resource tests to accept the expanded supported versions. |
tests/unit/test_dotenv_loading.py |
Adds a fixture to prevent sys.modules cleanup from leaking across tests (flake reduction). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -30,6 +30,7 @@ flash build [OPTIONS] | |||
| - `--output, -o`: Custom archive name (default: artifact.tar.gz) | |||
| - `--exclude`: Comma-separated packages to exclude (e.g., 'torch,torchvision') | |||
| - `--preview`: Launch local test environment after successful build (auto-enables `--keep-build`) | |||
| - `--python-version`: Target Python version for worker images (`3.10`, `3.11`, or `3.12`). Overrides per-resource `python_version`. Default: value declared on resource configs, or 3.12 if none set. | |||
There was a problem hiding this comment.
The flash build docs list --keep-build and --preview, but build_command in cli/commands/build.py does not expose either option (and the build directory is currently always kept on success). Please either implement these flags or update the docs to match the actual CLI behavior (and how --preview is supported via flash deploy).
…E-2827) Expand GPU and CPU worker images to support Python 3.10, 3.11, and 3.12. Flash apps ship as one tarball so every resource must share a Python version; the build step now reconciles per-resource python_version declarations into a single app-level value or accepts an explicit --python-version override. - constants.py: expand GPU_PYTHON_VERSIONS / CPU_PYTHON_VERSIONS tuples - manifest.py: add _reconcile_python_version; stamp target_python_version on every resource; raise on conflicting declarations - build.py / deploy.py: add --python-version CLI flag, thread through run_build and ManifestBuilder - docs: document per-app python_version, cold-start tradeoff for non-3.12 GPU images, and the 3.10 EOL window - tests/unit/test_dotenv_loading.py: add preserve_runpod_flash_modules fixture so module-deletion tests don't leak stale module references into sibling test files (unblocks deterministic test ordering)
The effective app-level Python version is assigned by build() via _reconcile_python_version(). Pre-build access always saw a value that build() would overwrite — misleading and never read. Initialize to None so readers fail fast if they touch the attribute before build().
4b93537 to
ff6aed9
Compare
Neither flag is exposed by flash/cli/commands/build.py::build_command; --preview now lives on flash deploy. Aligns the docs with actual CLI behavior — build directory is always retained on success, so the --keep-build toggle is gone. Caught by Copilot review on PR #322.
Summary
Expand GPU and CPU worker images to support Python 3.10, 3.11, and 3.12. Flash apps ship as one tarball so every resource in an app must share a Python version; the build step now reconciles per-resource
python_versiondeclarations into a single app-level value, or accepts an explicit--python-versionoverride onflash build/flash deploy.This is the SDK half of AE-2827. The flash-worker half (GPU Dockerfile parameterization for side-by-side torch install + CI matrix expansion + worker startup assertion) ships as a sibling PR once image builds can be validated end-to-end.
Changes
constants.py—GPU_PYTHON_VERSIONS/CPU_PYTHON_VERSIONSexpanded from("3.12",)to("3.10", "3.11", "3.12").DEFAULT_PYTHON_VERSIONstays 3.12.build_utils/manifest.py— new_reconcile_python_version(). Stamps the reconciled version onto every resource'starget_python_version; removes the now-redundant GPU/CPU special-case that hardcoded 3.12.build.py/deploy.py—--python-versionCLI flag threaded throughrun_buildintoManifestBuilder. Existing pip-wheel threading via_resolve_pip_python_versionkeeps working.docs/Flash_Deploy_Guide.md— new "Python version selection" section documenting per-resource declarations, the app-level override, the ~7 GB GPU cold-start tax for non-3.12, and the 3.10 EOL (2026-10-31) warning.tests/unit/test_dotenv_loading.py— addpreserve_runpod_flash_modulesfixture. Two existing tests force-reloadrunpod_flashviadel sys.modules[...]without restoring, leaking stale module references into sibling test files and causing flakes inTestRemoteClassWrapperPickle. The fixture snapshots and restores. Pre-existingmainhad 22 order-dependent failures in a-p no:randomlyrun; this branch now has 0 in the pickle cluster.Reconciliation rules
Resolution order when building the manifest:
--python-versionoverride (validated againstSUPPORTED_PYTHON_VERSIONS)python_versiondeclared across resource configsDEFAULT_PYTHON_VERSIONwhen no resource declares oneRaises
ValueErrorwhen resources declare conflicting versions, or when the override conflicts with a resource's explicit declaration.Test plan
make quality-checkpasses (85.7% coverage)TestReconcilePythonVersioncases cover override/conflict/default pathstest_constants.py+test_live_serverless.pypytest -p no:randomly tests/unit/pickle-cluster failures drop from 6 → 0